Language Model and Grammar Extraction Variation in Machine Translation

نویسندگان

  • Vladimir Eidelman
  • Chris Dyer
  • Philip Resnik
چکیده

This paper describes the system we developed to improve German-English translation of News text for the shared task of the Fifth Workshop on Statistical Machine Translation. Working within cdec, an open source modular framework for machine translation, we explore the benefits of several modifications to our hierarchical phrase-based model, including segmentation lattices, minimum Bayes Risk decoding, grammar extraction methods, and varying language models. Furthermore, we analyze decoder speed and memory performance across our set of models and show there is an important trade-off that needs to be made.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A new model for persian multi-part words edition based on statistical machine translation

Multi-part words in English language are hyphenated and hyphen is used to separate different parts. Persian language consists of multi-part words as well. Based on Persian morphology, half-space character is needed to separate parts of multi-part words where in many cases people incorrectly use space character instead of half-space character. This common incorrectly use of space leads to some s...

متن کامل

The Relationship between EFL Learners’ Explicit Knowledge of Source Language and Their Translation Ability

The purpose of this study was to investigate the relationship between students‘ explicit knowledge in grammar and their translation ability. The importance of grammatical knowledge and its effectiveness in translation quality motivated the researcher to run this study and consider grammatical knowledge in Per- sian as the source language of Iranian students. It is clear that grammar is an area ...

متن کامل

Joshua: An Open Source Toolkit for Parsing-Based Machine Translation

We describe Joshua, an open source toolkit for statistical machine translation. Joshua implements all of the algorithms required for synchronous context free grammars (SCFGs): chart-parsing, ngram language model integration, beamand cube-pruning, and k-best extraction. The toolkit also implements suffix-array grammar extraction and minimum error rate training. It uses parallel and distributed c...

متن کامل

Kriya - An end-to-end Hierarchical Phrase-based MT System

This paper describes Kriya – a new statistical machine translation (SMT) system that uses hierarchical phrases, whichwere first introduced in the Hieromachine translation system (Chiang, 2007). Kriya supports both a grammar extraction module for synchronous context-free grammars (SCFGs) and a CKY-based decoder. There are several re-implementations of Hiero in the machine translation community, ...

متن کامل

Akamon: An Open Source Toolkit for Tree/Forest-Based Statistical Machine Translation

We describe Akamon, an open source toolkit for tree and forest-based statistical machine translation (Liu et al., 2006; Mi et al., 2008; Mi and Huang, 2008). Akamon implements all of the algorithms required for tree/forestto-string decoding using tree-to-string translation rules: multiple-thread forest-based decoding, n-gram language model integration, beamand cube-pruning, k-best hypotheses ex...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010